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Abstract 

Although browsing on the World Wide Web (WWW) feels like an anonymous activity, it is hardly that way. Website 
administrators generally gel a lot of information about users and their browsing behavior, enabling such things as one-to-one 
marketing. Even more information is available to Internet service providers whose HITP proxy servers may keep track of every 
Website visited by their subscribers. Similarly, it is difficult to publish data on the Web without revealing the corresponding 
iriTP server name or IP address. In this situation, privacy protection and anonymity services for the WWW are becoming 
increasingly important fields of study. This paper overviews and briefly discusses some technologies that are available today 
and that can be used to provide support for anonymous browsing and anonymous publishing on the WWW. ©2000 Elsevier 
Science B.V. All rights reserved. 
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1. Introduction 

Today, activities on the World Wide Web (WWW) 
are increasingly subject to observation. For exainple, 
;ls Web-based electronic coininerce becomes more 
prevalent, the browsing behavior of Web users reveals 
individual shopping habits and spending patterns, 
as well as other data that people have traditionally 
considered to be personal and private. Siinilarly, the 
Web is becoming an iinporlant source for information 
gathering. In a competitive environment, a company 
inay wish to protect its current research interests. 
However, monitoring HTTP traffic may indicate tlie 
company's primary focus. By keeping Web browsing 
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characteristics private, the company's interests may 
be adequately protected, too. Finally, some electronic 
payment systems allow secure transactions over the 
Inteniet while preserving the untraceability that cash 
allows. However, if digital cash is transmitted over a 
channel that identifies both the purchaser and the mer- 
chant, the transaction may no longer stay anonymous. 

In general, there are tliree types of anonymotis 
communication properties that can be provided 
individually or combined [3]: 

• Sender anonymity 

• Receiver anonymity 

• Unlinkability of sender and receiver (connection 
anonymity) 

In short, sender anonymity means that the 
identity of tlie party who sent out a inessage 



0167-739X/00/S - sec from mailer ©2000 Elsevier Science B.V. All rights reser\'ed. 
PII: S0167-739X(99)00062-X 



BNSDOCID: <XP 4185S50A_I_> 



H. Oppliger/ Future Generuiion Computer Systems J 6 (2t.fUU} 379-391 



388 

The service is freely available lo the public and ac- 
cessible ai URL hup://j anus. fern iini-hagen.de, II pro- 
vides anonymiiy services for both browsers and Web 
publishers (servers): 

• In order to provide anonymity services for the 
browser, the JAM US service acts as a proxy ser- 
vice. It accepts requests from arbitrai-y browsers, 
removes all data thai may reveal information about 
the requesting user, and forwards the request to the 
server. Similarly, the server's response is relayed 
back to the browser. In this way, JANUS is con- 
ceptually similar to the Anonymizer and related 
services for anonymous browsing. 

• In order to provide anonymity services for Web pub- 
lishers and to support anonymous publishing ac- 
cordingly, the JANUS service is able to encrypt and 
decrypt URi^s on the fly in a way that these can 
be used as reference for a server. More precisely, if 
a request with an encrypted URL occurs, JANUS 
is able to deciypt the URL and forward it to the 
server, without enabling the user to get knowledge 
about the decrypted URL, Similarly, all references 
in the sei'ver's response ai'e again encrypted Ixfore 
the response is forwarded to the browser. The fea- 
ture of hiding the server's IP address or host name 
is the main advantage of using the JANUS service. 
In fact, it is the feature that supports anonymous 
publishing on the Web. 

URL encryption and decryption is a suitable ap- 
plication for public key cryptography. Note that 
every t)ody should be able to encrypt and publish 
a URL, whereas only the JANUS service should 
be able to decrypt it. Basis of the encryption is 
the RSA algorithm which is used with 768-bit 
keys. For example, using JANUS to encrypt the 
URL http://www.ifi .unizh.ch/'^oppliger results in the 
following expression: 

http://jamis.femuni-hagen.de/jamis_encrypted/MTA 
mJIaWc-}-bdgumvni5xnsaYsQ6MTyg-hVQnJpoHW3 
+TDtb04ir$6gcAFlwdtEVrGhvNR8rSic2nbsK0D61$ 
3mqnJmi3LCY IlfTSgRN 1 5yEOp EserUoAgy5i4LUk 
VZ<;cpWk= 

Consequently, 1 could publish this expression in 
order to allow users to visit my homepage while 
staying anonymous. When a user requested this 
encrypted URL, JANUS would decrypt the request to 



see http://www.ifi.unizh.ch/'^oppliger. It would then 
forward the request and retuni the result, encrypting 
and rewiiiing all of the URLs in the page so that your 
network logs show only a connection to JANUS, and 
even the URLs recorded in (he logs do not reveal any 
information about the origin HTTP servers. On the 
other side, the server logs at www . if i . unizh , ch 
only reveal that the homepage was accessed by 
corona.fernuni-hagen.de (or any other host 
that currently runs the JANUS serviceV 

Nevertheless, there are at least three limitations and 
shortcomings that should be kept in mind when using 
the JANUS service: 

• First, it is important to note that only the URLs are 
encrypted, so if somebody eavesdrops on the actual 
data stream, he will not be fooled. 

• Secondly, the standard cautions about single-hop 
forwarding services apply: coordinated and well- 
placed sniffers could determine the mapping for a 
site. 

• Thirdly, Webmasters must have a possibility to 
publish encrypted URLs. 

The above-mentioned limitations and shortcomings 
of the JANUS service are addressed in a technology 
overviewed and briefly discussed next. 

5.2. TAZ ser\-ers and the rewebber network 

More recently, Ian Goldberg and David Wagner 
from the University of California in Berkeley have 
developed a more comprehensive and sophisticated 
approach to address the anonymous publishing prob- 
lem [12]. The technology they propose can be viewed 
as a generalization of the JANUS service (similar to 
the fact that an anonymous remailer network general- 
izes the anon.penet.fi service). In fact, the re- 
searchers suggest the use of more (than just one) proxy 
servers to collectively disguise the server's location, 
and to encrypt the entire data stream between the re- 
questing browser and the origin HTTP server. In addi- 
tion, they also suggest the establishment of a directory 
service for encrypted URLs. 

More precisely, the tenn * rewebber' is used to re- 
fer to a proxy server that essentially implements the 
JANUS service's core functionality and the ability to 
encrypt the data traffic. In addition, each rewebber is 
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also able to understand 'nested' URLs (i.e.. URLs of 
the fonn htlp://proxy.com/http://www.real site .com). 
Most existing HTTP proxy servers have. this ability. 
Tlie basic idea is to publish a URL such as the one 
given above (which points to proxy .com instead 
of www.realsite.com), and to use public key 
cr>'ptography to encrypt the real server name (the 
second part of the URL) so that only the rewebber 
can decrypt and actually see it. Encrypted URLs are 
also referred to as locators. In the example given 
above, the requested URL would look something 
like http://proxy.com/IRFkK4J... (RFkK4J... be- 
ing ^the encrypted URL or locator for the URL 
hiip://www.realsite.com). The fact that the URL is 
encrypted is indicated by the leading ! instead of the 
expected http://. The rewebber at proxy.com, 
upon receiving the locator, would first decrypt it with 
its private key, and then proceed to retrieve the nested 
URL in the nomial fashion. Obviously, this mech- 
anism can be iterated to more than one rewebt>er. 
Consequently, all currently installed and oj^erating 
rewebbers are collectively referred to as the rewebber 
network. 

So far, the mechanism yet hides the real location of 
the origin HTTP server from the browser, but still has 
some flaws. First of all, once the browser has retrieved 
the resotirce from the rewebber, it could use one of 
the more powerful WWW search engines to try to find 
where the resource originally came from. This prob- 
lem can be solved by encrypting the resource before 
storing it on the server. Thus, if the resource is ac- 
cessed directly (e.g., through a WWW search engine), 
it will look like random data. In their implementation 
of a rewebber network, the researchers used the DESX 
encryption algorithm. ^ The DESX key is given to the 
rewebber in the encrypted part of the locator; that is, 
when the rewebber decrypts the locator, it finds not 
only a URL to retrieve, but also a DESX key with 
which to decrypt the resource thus retrieved. It then 
passes the decrypted resource back to the browser. 

The techniqtieof encr>'pting a resource stored at the 
server has another benefit: the resource can be padded 
in size before being encrypted, and the rewebber will 
tnmcate the resotirce to its original size before passing 



^The DESX encryption algorithm refers lo a technique intended 
lo extend the strength of DBS that was originally proposed by 
Ronald L. Rivest. 
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it back to the browser. The reason for this is similar 
to that for encrypting the resource in the first place. If 
the retrieved resource is 1234 bytes long, for example, 
a WWW search for encrypted resources near that size 
would quickly narrow down the possible choices. To 
thwart this, one can always add random padding lo the 
end of encrypted resources so that their total length is 
one of a handful of fixed sizes, such as 10, 20, 40, 80, 
etc. kbytes. 

To implement chaining, the URL in the encrypted 
portion of a locator is replaced with another (complete) 
locator, one which points to a rewebber ai a different 
site (and preferably, in a different legal jurisdiction). 
As mentioned above, this process can be iterated, thus 
making the rewebber chain listed in the locator as long 
as one likes. 

When using rewebl>er chaining, the resource will 
need to be multiply encrypted on the server. To do this, 
the publisher randomly selects a DESX key for each 
rewebb>er in the chain, encodes them into the locator, 
and iteratively encrypts the resource. In this way the 
publisher fonns a locator and announces it in public 
(e.g., by using an anonymous remailer service to post 
it to a newsgroup). Note that all of the security param- 
eters, including tlie length of the rewebber chain, are 
under the control of the publisher; individual publish- 
ers may adjust these parameters to fit their anonymity 
needs. Tlie main benefit of chaining is that the reweb- 
ber closest to the browser ever sees the decrypted data, 
and only the rewebber closest to the server knows 
where it is really getting data from. In order to link the 
two, the cooperation of every rewebber in ilie chain 
would be necessar>'. Tliis avoids the existence of a sin- 
gle point of failure, and allows the distribution of tnist 
throughout tlie network. 

Once a rewebl>er network is deployed, the stage 
will be set for the ability to publish anonymously 
on the Web. One major drawback, however, is that a 
locator that contains just a simple chain of rewebbers 
looks something like this: 

hltp://rewebber.com/!RjViOrawjGRT50ECKo-UBa 
7Qv3FJIRbyej_Whl0g_9vpPAyeHinrYElQLlH2ifN 
h2Ma4UYt31aqeQRXXd7oxEvwR8wJ3cnrNbPF6rcl 
Uzr6mxJWUtlgW0uRJL0bGkAv3fX8WEcBdlJPWG 
T8VoY0FljxgPL7OvuV0xtbMPsRbQg0iY=RKLBa 
U YedsCnON-U Q0m5 J WTE 1 nuoh _ J5 J _ y g 1 CfkaN9j 
SGkdf5 1 -gdj3RN4XHf_\^yxfupgc8VPsSyFdEeR0 
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dj9kMHuPvLivE_awqAwU_3Af8mc44QBN0fMVJj 

peyHSa79KdTQ5EGlPzLK7upFXlUFcNLSD7YLSc 

1 gKI3X8nk 1 5s=RXbQaqmOAx4Vli KPwkLVK.MM 

Jaz9wchn_pl48xhTzgndl5Hk09VToLyz7EF4wGH3X 

KPD7YbKVyiDZylva-sUBcdqpmPXTzApYLBnl4ii 

DOylolPiilRky8CxRrnC9BvQqof853n99vkuGlCP9 

K4p3H7pl6i8DOal-NrOlndpz5xgwZKc=/ 

Obviously, this is hard to announce in public and 
there is a naming problem that needs to be solved 
(similar to the announcement of the encrypted URLs 
for the JANUS service). 

In [12J, it is proposed to create a virtual names- 
pace called the .taz namespace (TAZ standing for 
'Temporary Autonomous Zone'), and to create new 
servers called TAZ servers to resolve this namespace. 
The function of a TAZ server is to offer publishers an 
easy way lo point potential readers at their material, 
as well as offering readers an easy way to access it. A 
TAZ server consists essentially of a public database 
mapping virtual hostnames ending in . taz to reweb- 
ber locators. Note thai nothing in this database must 
be kept secret. Unlike a anon .penet . f i-style re- 
mailer (which associates a alias e-mail address with 
a real one), TAZ servers merely associate .taz ad- 
dresses with locators. Most importantly, the TAZ 
server administrator cannot decrypt the locators that 
:ire stored in the database. There is the potential for a 
great deal of futxne work in the TAZ servers. Central- 
ized solutions such as the one that was implemented 
so far may work well for a while, but in the future 
decenu*alized solutions may be preferable, for both 
scalability and availability reasons. A great deal could 
be learnt from the Internet's DNS, Anyway, more im- 
plementation and real-life deployment experience is 
needed to iniderstand the engineering trade-offs better. 



6. Conclusions 

Although browsing on the Web feels like an anony- 
mous activity, it is hardly that way. Website adminis- 
trators generally get a lot of information about users 
and their browsing behavior, enabling such things as 
one-to-one marketing. Even more information is avail- 
able to Internet service providers whose HTTP proxy 
servers may keep track of every Website visited by 
their subscril>ers. Similarly, it is difficult to publish 



data on the Web without revealing the correspond- 
ing HTTP server's name or IP address. In this situa- 
tion, privacy protection and anonymity services l or the 
WWW are becoming increasingly important fields of 
study. In this paper, we have overviewed and briefly 
discussed the technologies that are available today and 
that can be used for anonymous browsing and anony- 
mous publishing on the WWW. 

In addition to these technical approaches to provide 
anonymity services, there are also some voluntary co- 
operation privacy standards in development. For ex- 
ample, the W3C project Platform for Privacy Prefer- 
ences (P3P) seeks to provide a platform for trusted 
and informed online interactions. The goal of P3P is to 
enable users to exercise preferences about Websites' 
privacy practices. P3P applications will allow users to 
l:>e informed about Website practices, delegate deci- 
sions to their agent when they wish, and tailor rela- 
tionships with specific sites [14]. It is assumed that 
users' confidence will increase when presented with 
meaningful choices about services and their privacy 
practices. Further infonnation on P3P can be found at 
URi. http://www.w3.org/P3P/. 

Similarly, TRUSTe is an independent, non-profit 
organization dedicated to establishing a trusting en- 
vironment where users can feel comfortable dealing 
with companies on the Internet. Championed by 
CommerceNet and the Electronic Frontier Founda- 
tion (EFF), TRUSTe' s efforts focus on promoting 
trust through online privacy assurance, putting users 
in control of their personal information. Based on 
the principles of disclosure and informed consent, 
the TRUSTe Privacy Program, utilizes a branded, 
online 'seal' or tnistmark to signify disclosure of a 
Website's personal information privacy policy. Sites 
that display the tnistmark have formally agreed to 
adhere to the TRUSTe privacy principles, and to dis- 
close their infonnation gathering and dissemination 
practices. These companies must disclose what in- 
formation they gather, how the information will be 
used, and who they share infonnation with. To ensure 
that TRUSTe's privacy principles and the Website's 
disclosed practices are met, and the tnistmark is not 
being misused, TRUSTe oversees a comprehensive 
assurance process. Further infonnation on TRUSTe 
can be found at URL http://www.tnisie.org. 

In summary, the handling of personal information 
on the Web is a hotly debated topic. The need to 
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maximize user's privacy is ai odds ai a fiindameniaJ 
level wiih businesses' need to minimize fraud. The 
firsi goal seeks to maximize users^ anonymity, whereas 
the second goal requires users to be strongly and ini- 
equivocally identihed and authenticated. Somehow, a 
compromise must be stmck for this dilemma. 
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is hidden, while its receiver and the message 
itself mighl nol be. Similarly, receiver anonymity 
means that the identity of the receiver is hidden, 
while its sender and the message itself might not 
be. Finally, unlinkability of sender and receiver (also 
referred to as connection anonymity) means that 
though the sender and receiver can each be identi- 
fied as participating in some communication, they 
cannot be identified as communicating with each 
other. 

In the recent past, privacy protection and anonymity 
services for the WWW have become increasingly 
important fields of study. The aim of this paper is 
to overview and briefly discuss the technologies that 
are available today and that can be used for anony- 
mous browsing and anonymous publishing on the 
WWW. The rest of the paper is organized as follows: 
Section 2 elaborates on previous work. Section 3 
addresses the use of cookies and their implications 
for the privacy of Web users. Finally, Sections 4 
and 5 address anonymous browsing and anonyinous 
publishing, and Section 6 concludes with some final 
remarks. 



2. Previous work 

There is some previous work in providing 
anonymity services for electronic mail (e-mail) ser- 
vices. For example, anon.penet.fi was a sim- 
ple and easy-to-use anonymous e-mail forwai'ding 
service (a so-called anonymous remailer) that was 
operated by Johan Helsingius in Finland. In short, 
the anon, pe net. fi anonymous remailer was pro- 
vided by an SMTP proxy server that stripped off all 
SMTP header information of each incoming e-mail 
message before forwarding it to its destination. Then, 
if not already assigned, an alias for the sender was cre- 
ated. In the outgoing message, the real e-mail address 
of the sender was replaced by the alias that allowed 
the recipieni(s) of the message to reply to the sender 
without knowing his or her real identity or e-mail 
address. ConsequenUy, anon.penet.fi provided 
sender anonymity by simply keeping the mapping be- 
tween real e-mail addresses and their corresponding 
aliases anonymous. The downside of this simple ap- 
proach was that any user of anon .penet . f i had 
to tnist the service provider not to reveal his or her 



real identity or e-mail address. This level of trust may 
not always be justified. ' 

A more sophi'siicated approach to provide an 
anonymous e-mail forwai'ding service was proposed 
by David Chaum in the early 1980s 12;]. In fact, 
Chaum introduced a technology that uses public key 
cryptography to provide anonymity services in a 
so-called Chaum mixing network. According to this 
terminology, a Chaum mix refers to an anonymous 
remailer that, in addition to forwarding incoming 
e-mail messages, strives to hide the relationship be- 
tween incoming and outgoing data traffic. To achieve 
this, a Chaum mix typically encrypts the data traffic 
and may reorder, delay, and eventually pad data to 
eventually disable or complicate traffic analysis. 

In a Chaum mixing network, the sender of an 
anonymous e-mail message first chooses a route 
through a series of mixes to the intended destination, 
and then wraps some extra layers of data around 
the message. To fomn the innermost layer, the name 
of the last mix M„ - the mix one hop away from 
the message destination - is concatenated with the 
original message, and the result is encrypted with 
the public key of the second-to-last mix M„„i in 
the route. Consequently, the resulting bundle has 
one layer of routing data prepended to the original 
message, and it's encrypted with a key possessed 
only by M„_i. If the bundle were to somehow arrive 
at M„_i, it could be decrypted there, and the one 
layer of remaining routing data would be enough to 
get the message to M„ and from there to its final 
destination. This sort of message encapsulation can 
be repeated, the next time with the third- to- last mix 
M,;_2. The result is a bimdle that can be decrypted 
only by M„_2. Once decrypted there, the interior can 
be forwarded to M;,_i. At the same time, however, 
M„_2 cannot read the interior of the bundle, since 
that part is encrypted witli the public key of M„_i (of 
which the corresponding private key is known only 
by M„_i). 

One may think of this encapsulation scheme as an 
onion that is prepared by the sender. On the forward 



^ On 8 February 1995, based on a burglary report filed with 
ttie L^s Angeles police, transmitted by Interpol, the Finish police 
presented Helsingius a warrant for search and seizure. Bound to 
do by law, he complied, thereby revealing the real e-mail address 
of a single user. 
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route to ihe destination, each mix peels off one layer 
of encryption. 

If a Chaum mixing network ^vere used to trans- 
mit e-mai! messages only through one mix, this mix 
would have to be tnisied not to reveal the senders' and 
receivers' identities (since it sees both of them). 
Consequently, most people forward e-mail messages 
through two or more mixes so no single one exjx^- 
riences both the sender and receiver of a particular 
e-mail message. In other words, using two or more 
mixes keeps the sender anonymous to ever>' mix but 
the first and the receiver anonymous to ever>' mix but 
the last. Consequently, a user's identity is best hidden 
if he actually mns his own Chaum mix and directs all 
of his outgoing e-mail messages through it. 

If one is worried about an adversar>' powerful 
enough to monitor several Chaum mixes in a network 
simultaneously, then one also has to worry about 
timing and other correlation attacks. In the extreme 
case, suppose a Chaum mixing network is idle until 
a message is sent out, then even though an adversary 
cannot decrypt the layered encryption, he can still 
locate the route just by watching the right parts of 
the network and analyze the data traffic accordingly. 
Chaum mixing networks can resist such attacks with 
queues to batch, reorder, and process incoming mes- 
sages. In fact, each mix may keep quiet - absorbing 
incoming messages but not transmitting them - un- 
til its outbound buffer starts to overflow, at which 
point the mix emits a randomly chosen message to its 
next hop. However, due to the real-time requirements 
of some applications, the batching, reordering, and 
processing of data in a queue is not always possible. 

One question arises immediately with regard to the 
use of anonymous remailer services: how can the re- 
ceiver of an (anonymous) e-mail message reply to the 
proper sender? The answer is that he cannot unless 
he is explicitly told how to do. A simple technique 
is to tell the receiver to send his reply to a certain 
newsgroup witli a specific subject field. The reply can 
then be grabbed by the sender from the correspond- 
ing newsgroup. This approach of replying is yet un- 
traceable but also expensive and unreliable. A more 
sophisticated technique would use the knowledge of 
how to build an untraceable forward route from the 
sender to the receiver, to build an inverse untraceable 
backward route from the receiver to ilie sender (note 
that the forward and backward routes are independent; 



381 

they can be identical or completely disjunct). Accord- 
ing to this technique, the sender computes a block of 
infonnation that is used to anonymously return a reply 
from the receiver to the sender. This additional block 
of infonnation is sometimes also referred to as a re- 
lum path infonnation (RPF) block. The RPl block is 
prepended to the original message and padding data 
thai is sent from the sender to the receiver at first 
place. 

The use of Chaum mixes to provide anonymous 
e-mail forwarding and RPl-based reply ser\'ices was 
prototyped by Ceki Ciilcii and Gene Tsudik at the IBM 
Zurich Research Laboratory. They used tlie scripting 
language Perl and the Pretty Good Privacy (PGP) soft- 
ware to build a system called BABEL |7]. 

Unforuniately, the lessons learnt from anonymous 
remailer services do not necessarily hold for WWW 
data traffic, since the characteristics of e-mail and 
Web-based applications are fundamentally different: 

• First, the WWW is an interactive medium, while 
e-mail is store-and-forward; 

• Secondly, e-mail is a 'push' technology, meaning 
that the sender of an e-mail message initiates the 
data transfer, possibly without even the knowledge 
or consent of the receiver (the existence of e-mail 
bombing attacks illustrates this point). By contrast, 
the WWW is a 'pull' technology, meaning that 
the receiver must explicitly request data from the 
sender. 

The first difference implies that Chaum mixing 
networks are unacceptable (or at least difficult to use) 
for Web traffic. Nevertheless, the second difference 
also offers some possibilities to improve security (in 
lenns of anonymity). For obvious reasons, the se- 
curity of an anonymity-providing system, such as a 
Chaum mixing network, increases as the number of 
available and publicly accessible cooperating nodes 
(e.g., Chaum mixes) increases. In the realm of e-mail, 
operators of anonymous remailers often come under 
fire when their services are abused by people sending 
threatening letters or spam. In fact, the un desirability 
of handling irate users causes the number of anony- 
mous remailers to stay considerably low, potentially 
impacting on the security of the overall system. By 
contrast, an HTTP or Web server cannot initiate a 
connection with an unwilling browser and send it 
data when no request was made. This 'consensual' 
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nature of the Web should cause fewer potential node 
operators to becoine discouraged, and therefore lead 
to corresponding increases in security. Finally, note 
thai HTTP proxy servers are also well suited to im- 
plement anonymity services mainly because of their 
inherent caching capabilities (to improve network 
performance). The very fact that data is being cached 
at particular proxy servers makes it less likely that re- 
quests get forwarded all the way to the origin server. 
This makes traffic analysis more complicated and 
somehow harder to accomphsh. 

Contrary to anonymous e-mail forwai'ding ser- 
vices, there is only few research going on to provide 
anonymity services for the WWW. As a matter of fact, 
the consensus on WWW privacy protection is that 
there just is not much and that commercial interests 
are unlikely to champion the cause. 

Before we delve into the details of anonymous 
browsing and anonymous publishing on the Web, 
we briefly discuss the use of cookies for HTTP state 
management. 



3. Cookies 

Let's assume a WWW server that should be con- 
figured to collect information about paiticular users 
to customize subsequent sessions.- In this situation, 
there are two possibilities: 

• The server is configured to locally store the state 
inforiTiation on a per-user basis; 

• The server is configured to download the stale in- 
formation to the browsers where it is stored on the 
server's behalf. 

Following the first approach, the server would have 
to build a huge database to store and make avail- 
able state information related to particular users. This 
database tends to increase very rapidly. Contrary to 
that, the state infonnation is not stored locally in the 
second approach. Instead, the information is down- 
loaded to the browser where it is stored in a decen- 
tralized and fully distributed way. The next time, the 



^ Nole that the lemi 'session' here does not refer to a persistent 
HTTP connection but rather to a logical session created from 
HTTP request and response messages that belong together and 
correspond to each other. 



browser connects to the server, it simply retransmits 
the appropriate stale information (the one thai 'be- 
longs' to*lhis particular server). 

The HTTP slate tnanagement mechanism as speci- 
fied in RFC 2109 follows the second approach 1 13]. It 
uses the tenn 'cookie' to refer to the data string that 
encodes the state information that passes between ihe 
origin sei*ver and the browser, and that gets stored by 
the browser. IVlore precisely, the mechanism specifies 
a way to create a stateful session with HTTP request 
and response messages. It describes two new headers, 
namely the Set-Cookie and the Cookie headers, that 
carry state information between participating origin 
HTTP servers and browsers. 

To initiate a session, the origin HTTP server re- 
turns an extra response header to the browser, the 
Set-Cookie header. The browser, in turn, returns a cor- 
responding Cookie header to the server if it chooses 
to continue a session. The origin server may ignore 
it or use it to deiennine the current state of the ses- 
sion. The syntax and semantics of the Set-Cookie 
and Cookie headers ai*e fully specified in the RFC 
mentioned above. For example, a typical Set-Cookie 
header would look as follows: 

Set-Cookie: USER-NAME = Rolf ; 
path = /; expires = Wednesday, 
18-NOV-99 23:12 

The browser stores the cookie locally. If it requests 
a resource in path T on the same server (before 18 
November 1999), it would send out the following 
Cookie header: 

Cookie: US ER_ NAME = Rolf 

Now the server knows that the requesung user has 
been previously assigned the usemaine 'Rolf. If there 
are other attributes being stored in the cookie, the 
server may also customize its behavior for this partic- 
ular user. 

The use of cookies raises some privacy concerns. In 
fact, a server may use Set-Cookie and Cookie headers 
to track the path of a user through the server. Users 
may object to this behavior as an intnisive accumu- 
lation of information, even if their idenuty may not 
be evident (identity may be evident if a user fills out 
a form that contains identifying information). Conse- 
quently, a user should be able to enable or disable the 
HTTP state management mechanism, and to either 
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reject or refuse cookies accordingly. This is possible 
in all major Ijrowscrs, including Netscape Navigator 
and Microsoft Iniemet Explorer. * 

As of this writing, the HTTP stale management 
mechanism is not cryptograph i call y protected and 
must be considered to he unsecurc. More recently, re- 
search has addressed possibilities to secure the HTTP 
slate management mechanism with cr>'piographic 
technologies |15]. 

4. Anonymous browsing 

In this section, we address four technologies that 
can be used to protect the privacy of Web users and 
to support anonymous browsing accordingly. 

4. 1 . 7 h e A nonymize r 

The Anonymizer^ is a simple service that can be 
used to browse anonymously through the Web. As 
such, it is probably the most heavily used anonymizing 
service for the WWW. In short, the Anonymizer ser- 
vice is provided by a HTTP proxy server that nms on 
port 8080 of www . anonymizer . com . A tiser con- 
nects to this port, and the corresponding proxy server 
forwards the target URL to the origin HTTP server. 
In essence, the Anonymizer service is for the Web 
what the anon .penet . f i service was for e-mail: 
a simple and easy-to-use anonymous forwarding ser- 
vice. However, contrary to anon.penet.fi, the 
Anonymizer is a commercial service. You can either 
use this service for free (and pay a penalty in ternns 
of a bonus 30-60 s delay per Web page and advertise- 
ments being included into the retrieved pages), or sign 
up for a paid account. 

For example, if you want to use the Anonymizer 
service to retrieve the author\s homepage located 
at URL http://www.ifi .unizh.ch/^oppliger, you type 
in htlp://www.anonymizencom:8080/www.ifi.unizh. 
chl^ oppliger. This request will have the Anonymizer' s 
HTTP proxy server retrieve the page from 
www . if i . unizh . ch on your behalf. In doing 
so, the browser only leave traces originated from 
sol.infonex.com in the log files of the origin 



Anonymizer is a trademark of Anonymizer, Hie service provider 
can be contacted at URL http://www.anonymizer.com. 
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HTTP ser\'er (sol.infonex.com is the domain 
name system (DNS) name of the machine that cur- 
rently hosts the Anonymizer ser\'ice). Conseqtiently, 
the administraior(s') of www . if i , unizh . ch will 
not be able to reveal the IP addresses or DNS names 
of the systems that requested the above-mentioned 
URL using the Anonymizer serx'icc at first place. 

In summar)', tlie Anonymizer ser\'ice is well suited 
to hide user identities and browser IP addresses from 
HTTP servers. Nevertheless, there are at least two 
problems that should be kept in mind and considered 
with care when using tlie ser\'ice: 

• First, a user has to tnist the service provider not 
to reveal his identity. Note that the HTTP proxy 
server can be set up in a way that logs all re- 
quested URLs. Consequently, the Anonymizer 
service provider may get a considerably good pic- 
ture about its users' browsing behavior on the 
Web. They have to tnist the Anonymizer service 
provider not to reveal (or sell) this picture. Also, 
according to the Anonymizer user agreement, the 
logs are kept much longer than is technically re- 
quired (15 days). So the Anonymizer can t>e raided 
just as anon, penet . fi was a couple of years 
ago. 

• Secondly, although the Anonymizer is fine at hid- 
ing user identities and browser IP addresses from 
origin HTTP servers, it is not so good at hiding the 
server identities from the network segineni(s) be- 
tween the browser and the Anonymizer. For exain- 
ple, in the example given above the local network 
administrator (or the administrator of any proxy 
between the client and the Anonymizer service) 
can see www. if i .unizh . ch/^oppliger be- 
ing requested by just unpacking the URL. 

Obviously, a possible solution to the first problem 
is to chain several anonymous HTTP proxy servers 
similar to the Anonymizer, whereas a solution to the 
second problem is to encrypt the target URL in a way 
that can be decrypted only by appropriate HTTP proxy 
servers (e.g., using the public keys of these servers). 

4.2. Onion routing 

More recently, a group of researchers at the US 
Naval Research Laboratory (NRL) have adapted the 
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idea of using a Cauin mixing network lo implemeni 
anonymous connections. Anonymous connections are 
similar to TCP/IP connections, instead thai they are 
resistant to both passive and active eavesdropping 
and traffic analysis attacks. Anonymous connections 
are bidirectional, have small latency, and can be used 
anywhere a TCP/IP connection can be used. Note 
that a connection may be anonymous, although com- 
munication need not be (e.g., if the data stream is 
not enciypted). The NRL reseaj'chers have proto- 
typed the technology in a system called onion routing 
[6,9,10].^ 

In the descriptions that follow, we use the temi 
'onion' to refer to a layered encrypted message, 
whereas the term 'onion router' is used to refer to a 
Chaum mix that acts as forwarding node in an onion 
routing network. 

In an onion routing network, instead of making 
TCP/IP connections directly to a responding machine 
(a responder), an initiating application (an initiator) 
establishes an anonymous connection through a se- 
quence of onion routers. Contrary to normal routers, 
onion routers are connected by permanent and long- 
standing TCP/IP connections. Although the technol- 
ogy is called onion routing, the routing that occurs 
does so at the application layer (and not at the Inteniet 
layer). For any anonymous connection, the sequence 
of onion routers in a route is strictly defined at connec- 
tion setup, and each onion router can only identify the 
previous and next hops along the route. Due to the use 
of encryption technologies, the data that passes along 
the anonymous connection appears differently at each 
onion router, so data cannot be tracked en route and 
compromised onion routers cannot collude. 

In onion routing, an application does neither di- 
rectly talk to a router nor to an onion router. Instead, 
there must be proxies that interface between the appli- 
cations and the onion routing network. For exainple, 
to access a Website through an onion routing network, 
a user must set the browser's HTTP proxy to point 
to an onion network entry point (a so-called 'applica- 
tion proxy'). In fact, the initiator establishes a TCP/IP 
connection to an application proxy, and this proxy de- 



^ The onion routing system is conceptually similar (o the PipeNel 
proposal that was posted by Wet Dai to the Cypherpunks mailing 
list in February 1995. Contrary to the onion routing system, the 
PipeNet proposal has not been implemented so far. 



fines a perhaps random route through the onion routing 
network by constnicting a layered data structure (an 
onion) and sending that onion through the network. 
Similar to a Chaum mixing network, each layer of the 
onion is encrypted with the public key of the intended 
onion router and defines the next hop in the route. An 
onion's size is fixed, so each onion router adds some 
random padding data to replace the reinoved layer. The 
last onion router forwaj'ds data to the responder' s ap- 
plication proxy, whose job is to pass data between the 
onion routing network and the responder. In addition 
to carrying next hop information, each onion layer also 
contains key seed material from which cryptographic 
keys are derived (for encrypting and decrypting data 
sent forward and backward on the route of the anony- 
mous connection). 

After having sent the onion, the initiator's applica- 
tion proxy starts sending data through the anonymous 
connecuon. As data moves through the anonymous 
connection, each onion roiuer removes one layer of 
encryption, so it finally arrives as plaintext. Obviously, 
the layering occurs in the reverse order for data mov- 
ing backward to the initiator. Stream ciphers are used 
for data encryption and decryption. Similar to the orig- 
inal idea of a Chaum mixing network, onion routers 
may also randomly reorder the data they receive be- 
fore forwarding it (but preserve the order of data in 
each anonymous connection). 

As mentioned previously and contrary to the orig- 
inal idea of a Chaum mixing network, the batching 
technique is out of the question for the support of inter- 
active applications, such as Web browsing. This means 
that coordinated observation of the network links con- 
necting onion routers could reveal an anonymous con- 
nection's route and reveal the source and destination 
IP addresses accordingly. Therefore, it is important to 
ensure that the links between the onion routers cannot 
be simultaneously eavesdropped. The easiest approach 
is to put onion routers on different network segments 
in difibrent buildings with different administrators - 
ones who would be unlikely to collude. Also note that 
by layering cryptographic operations in the way de- 
scribed above, an advantage is gained over plain old 
link layer encryption. Even through the total crypto- 
graphic overhead for passing data is the same as for 
link layer encryption, the protection is better. In link 
layer encryption, the chain is as strong as the weak- 
est link: one compromised node can reveal everything. 
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In onion routing, however, the chain is as sirong as 
its strongest link: one honest onion router is enough 
to maintain the anonymirv' of the' connection. Even if 
link layer encryption is used together with end-io-end 
encryption, coinpromised nodes can cooperate to re- 
veal route information. This is not possible in an onion 
routing network, since data always api^ears differently 
to each onion router. 

For TCP-based application protocols that are 
proxy-aware, such as HTTP, Telnet, and SMTP, there 
exist proxies for Sun Solaris. Sun^ri singly, for certain 
applications that are not proxy -aware, most notably 
r log in, it has been possible to design interface 
proxies, as well. In either case, the best protection 
results from having a connection between an appli- 
cation proxy and an onion router that is tnisted on 
one way or another. For example, one possibility is 
to place an onion router on the firewall of a Website. 
In this case, the onion router would serve as an in- 
terface between the machines behind the firewall and 
the external network (most notably the Intemet). This 
firewall configuration is commonly used today. 

Refer to hup: //www.on ion -router.net for current 
status and other useful information about onion 
routing. 

43. Luceni personalized web assistant 

An increasingly nimnber of Websites require users 
to establish an account before they can access the site. 
This approach is sometimes called 'jx^rsonalized Web 
browsing.' Typically, the user is required to provide a 
unique usemame, a password, and an e-mail address. 

Establishing accounts at multiple sites is generally 
a tedious task. A user may have to invent a distinct 
usemame and a secure password, both unrelated to his 
identity, for each Website. In addition, the user may 
also want support for anonymous e-mail. Besides the 
infomnation that the user supplies voluntarily to the 
Website, additional infonnalion about the user may 
flow involuntarily from the user's browser to the Web- 
site, due to the nature of HTTP and the use of cookies 
(as described in Section 3). 

Against this background, a group of researchers at 
Lucent Technologies has developed a technology that 
makes personalized Web browsing simple, secure, and 
anonymous by providing convenient solutions to each 
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of the problems mentioned above |8]. The technology 
has been impleirienied in a system called Lucent Per- 
sonalized Web Assistant (LPWA). In short, the LPWA 
is an agent that may interact with several Websites 
on the user's behalf. It automatically derives a imique 
pseudonym or alias for a user at each site he visits, 
and transparently presents that alias to tlie site on re- 
quest. Typically, the alias consists of a usemaine, a 
password, and eventually an e-mail address. In gen- 
eral, different aliases are generated for each user and 
Website pair, but the same alias is presented whenever 
a user visits a particular Website. 

Providing sup|X>ri for personalized Web browsing, 
the LPWA frees the user from the burden of invent- 
ing and memorizing distinct usemames and passwords 
for each Website, and guarantees that an alias (includ- 
ing an e-mail address") does not reveal the identity of 
the user. In addition, the LPWA also provides support 
for a Website to reply to anonymous e-mail messages 
originated by a particular user. Finally, the LPWA is 
also able to filter the HTTP data u-affic to preserve 
user privacy. As such, the developers of LPWA claim 
that their system ^provides simultaneous user identi- 
fication and user privacy, as required for anonymous 
personalized Web browsing.' 

At the core of anonymous personalized Web 
browsing is the problein of names translation: trans- 
lating from the user's e-mail address and secret to 
an alias that fulfils a number of properties, including 
anonymity, consistency, secrecy, uniqueness of an 
alias, and protecfion from creation of dossiers. To ad- 
dress these requirements, a specific colli si on -resistant 
one-way hash function is used, and this function is 
called a Janus function. Input to the Janus fimction 
are the real LPWA usemame and password as well as 
the name of tlie requested Website [11]. 

In practice, the LPWA can be configured as a re- 
mote server (cenu-al proxy), as a local server (local 
proxy), or anywhere in between (e.g., firewall proxy), 
with different trade-offs in tenns of security, tmst, and 
convenience. The configuration of the current demon- 
stration is a single copy of LPWA ninning on a proxy 
at port 8000 of Ipwa . com. To use this central proxy, 
one must first configure the browser to automatically 
use it. The basic idea is to set the browser's HTTP 
proxy to Ipwa . com port 8000. Consider the fact that 
you want to register and access a Website of a com- 
pany that requires the submission of a usemame and 
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password. LPWA can aulomaticaJly generate this in- 
fonnation, and even a unique e-inail address where 
you can receive reium mail. For example, afier setting 
your browser to use the LPWA proxy, you can go to 
the home page of the company. LPWA will interpose 
a couple of pages describing the LPWA service and 
asking you for your real identity (or at least the one 
from which your pseudonyms will be derived). Then 
when LPWA sends you on to your original URL and 
the Website asks you to register, just give the string \u 
as your usemame. \p as your password, and @ as your 
e-mail address. LPWA will intercept those codes and 
replace them with the nonsensical pseudonyms it uses 
for you at that particular site. Log enU'ies in the origin 
HTTP server will then reveal requests originated from 
Ipwa , com only. 

4.4. Crowds 

A group of researchers at AT&T Reseai*ch has de- 
veloped a system called Crowds for protecting users' 
anonymity on the WWW [5]. Unlike other lechnqiues 
addressed so fai'. Crowds does not rely on Chaum 
mixes at all. Instead, Crowds, named for the notion 
of 'blending into a crowd', operates by grouping 
users into a large and geographically diverse group (a 
so-called 'crowd') that collectively issues requests on 
behalf of its member users. As such. Crowds is es- 
sentially a distributed and chained Anonymizer, with 
encrypted links between individual Crowds members. 
HTTP data traffic is forwarded to a crowd member, 
who flips a biased coin and, depending on the result, 
forwards it either to some other crowd member or to 
the origin HTTP server. This makes communication 
resistant to observers. 

More precisely, a crowd can be thought of as a col- 
lection of users. Each user is represented in a crowd 
by a process that nins on his system. In Crowds par- 
lance, this process is called a jondo (pronounced 'John 
Doe' and meant to convey the image of a faceless par- 
ticipant). The user or a system administrator acting on 
the user's behalf starts the jondo. When it is staj'ted, it 
contacts a server called the blender to request admit- 
tance to the crowd. If admitted, the blender reports to 
the jondo the current membership status of the crowd 
and information that enables the jondo to participate 
in the crowd. The user, in Uim, configures the jondo to 



serve as HTTP proxy by specifying its hostname and 
port number in his browser for all services, including 
Gopher, HTTP, and SSL. 

Thus, any request originating from the browser is 
sent directly to the jondo. Upon receiving the first re- 
quest from the browser, the jondo initiates the estab- 
lishment of a random path of jondos to and from the 
origin HTTP server. More precisely, the jondo picks a 
jondo from the crowd (possibly itself) at random, and 
forwards the request to it. When this jondo receives 
the request, it flips a biased coin to determine whether 
or not to forward the reuqest to another jondo. If the 
result is to forward, then the jondo selects a random 
jondo and forwards the request to it. Otherwise the 
jondo submits the result to the HTTP server for which 
the request was destined. Consequently, each request 
travels from the user's browser, through a number of 
jondos, and finally to the origin HTTP server. Sub- 
sequent requests initiated at the same jondo follow 
the same path (except perhaps going to a different 
HTTP server), and server response messages traverse 
the same path as the request messages, only in reverse. 

All communication between any two jondos is en- 
ci*ypted using a key known only to the two of them. 
Encryption keys are established as jondos join the 
crowd. Therefore, some group membership procedures 
must be defined. Those procedures determine who can 
join the crowd and when they can join, and inform 
members of the crowd membership accordingly. In 
fact, there are many schemes and corresponding group 
membership protocols that could be used to manage 
crowd memberships. While providing robust and reli- 
able disuibuted solutions, many of these schemes have 
the disadvantage of incuning significant overhead and 
of providing semantics that are too strong for the ap- 
plication at hand. In the Crowds system, a simpler 
and centralized solution is used. As already mentioned 
above, membership in a crowd is controlled and re- 
ported to crowd members by the blender. To make use 
of the blender (and thus the crowd), the user must es- 
tablish an account with the blender, i.e., an account 
name and password that the blender stores. When the 
user starts up a jondo, the jondo and the blender use 
this shared secret (the password) to authenticate each 
other's communication. As a result of this communica- 
tion, the blender may accept the jondo into the crowd 
and add the jondo (i.e., its IP address, port number, 
and account name) to its current list of members, as 
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well as repons ihis lis! back lo ihe jondo. In addiiion, 
ihe blender ajso generates and rejwrts back a list of 
shared keys, each of which can be used-.io auihenii-. 
caie another member of the crowd. The blender then 
sends each key to the other jondo that is intended to 
share it (encrypted under the account password for that 
jondo) and infonns the other jondo of the new mem- 
ber. At this point a]l meinbers are equipped with ilie 
data they need for the new meinber to participate in the 
crowd. 

Each member maintains its own list of crowd mem- 
berships. This list is initialized to that received from 
the blender when the jondo joins the crowd, and is 
updated when tlie jondo receives notices of new or 
deleted members froin the blender The jondo can also 
(autonoinously) remove jondos from its list of crowd 
meinbers, if it delects the corresponding jondos have 
failed. This allows for each jondo' s list to diverge from 
others' if different jondos have detected different fail- 
ures in the crowd. 

Obviously, a major disadvantage of this centralized 
approach to group meinbership management is that 
the blender is a trusted third party (TTP) for the pur- 
poses of key distribution and membership reporting. 
Techniques exist for distributing tnist in such a TTP 
among many replicas, in a way that the corruption of 
some fraction of the replicas can be tolerated [4]. In 
its present, non -replicated fonn, however, the blender 
is best executed on a trusted computer sysiein (e.g., 
with login access available only at the console). Note, 
however, that even though the blender is ATT? for 
some functions, HTTP traffic is not generally routed 
llirough the blender, and thus a passive attack on the 
blender does not immediately reveal the users' Web 
transactions. Moreover, the failure of the blender does 
not interfere with ongoing transactions. It is planned 
that in future versions of Crowds, jondos will establish 
mutually shared keys using the Diffie-Hellman key 
exchange, where the blender serves only to authenti- 
cate and distribute the Diffie-Hellman public values of 
the Crowds members. Tliis will eliminate the present 
reliance on the blender for key generation. Another 
possibility would be the use of Kerberos or another 
authentication and key distribution system [1]. 

A thorough security and perfonnance analysis for 
Crowds is given in [5]. Crowds 1.0 is implemented 
in Perl 5.0. According to tlieir developers, this script- 
ing language was chosen for its rapid prototyping 
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capabilities and its portability accross Unix and 
Microsoft platforms. While Crowds perfonnance is 
already encouraging, it could be further improved by 
reimplementing the system in a compiled language, 
such as C and C-h-h. Further infonnation on Crowds 
and the corresiX)nding software can be obtained 
from the project's homepage that is located at URL 
http://www.research.att.com/projects/crowds. Note, 
however, that due to US expon restrictions, the soft- 
ware can be obtained by US and Canadian citizens 
only. 

5. Anonymous publishing 

Tlie technologies overviewed and discussed so far 
address the problem of how to protect the privacy of 
Web users, and how to provide supix)rt for anonymous 
browsing. Contrary to that, this section addresses the 
problem of how to publish data anonymously on the 
Web. Note that the current WWW architecture pro- 
vides little support for anonymous publishing. In fact, 
the architecture fundamentally includes infonnation in 
the URL that is used to locate resources, and it is very 
hard for a Web publisher to avoid revealing this in- 
fonnation (at least if it is required that resources pub- 
lished anonymously be accessible from standard Web 
browsers without needing any special client software 
or anonymity tool). Also note that the browser privacy 
problem is orthogonal to the anonymous publishing 
problem, and that the two problems compose well: if 
full anonymity is needed, techniques for anonymous 
browsing will work well in tandem with an infrastnic- 
ture for anonymous publishing. 

In the subsections that follow, two basic technolo- 
gies are presented that can be used to address the 
problem of anonymous publishing on the Web. The 
first technology is rather simple and straightforward, 
whereas the second technology is more complex and 
sophisticated. 

5. J. JANUS 

JANUS is a joint research project of the Forschungs- 
institut fiir Telekommunikation (FTK) of Dortmund, 
Hagen, and Wuppertal in Gennany, One of the results 
of the project is an anonymous publishing service tliat 
is currently provided by tlie Fern universi tat Hagen. 
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